Project 2: Analyzing Airbnb Data in DMV areas

DATS 6103-O10
Weirui Liu

II. Data Analyzing

Now that we have lodging data in the DMV area. Let's see what patterns emerge when looking holistically at all of the lodgings we scraped.

The CSV file that we're going to analyze come from 900 lodging’s information runs of the Project2: Web Scraping notebook on Nov 9, 2020.

1. Data Cleaning and Preprocessing

1.1 Import Package

1.2 Read CSV File into DataFrame

1.3 Concatenate three Data Sets

2. Data Exploration, Alalysis and Visualization

2.1 Common Location

Now, we have 706 unique lodging's information. Let's just see what the common location in the DMV area.

Washington appears most frequently because the location of the Washington DC area only shows Washington. Outside of Washington DC, more lodging is loading in Baltimore and Silver Spring in Maryland, and Lynchburg and Stanardsville in Virginia.

2.2 Title Name Preference

The words we can see most frequently are "Private," "Cabin," "Cozy," "near," and "DC." From the above word cloud, we can see that the title most hosts get describes the type of lodging. Surprisingly, many hosts prefer to use the word "Cozy" in the title.

2.3 Analysis of Property Type in DMV area

Now let's learn more about what kind of property types there are in the DMV area and their median prices in the DMV area.

Today, there are 31 types of property types in the DMV area, the largest one being the private room, and the largest one is the entire apartment, the entire house, and the entire house. There are also particular property types such as Earth house, Boat, Dome house, Castle, etc.

Let's take a close look what kind of property types there are in the different area and their median prices in different area.

For the Comparison of DMV Area Graph, we can see the distribution of different property types of lodging in different areas. We can compare the median price difference of the same type of lodging in different areas through the stacked bar chart. Use the Dropdown Menus to see the number of property types in an individual area and their median price in that area.

It can be seen from the above several pictures that Virginia has abundant property types of lodging. The number of data in the private room in Maryland, the entire cabin in Virginia, the entire Apartment in DC, and the private room in DC is quite prominent. When comparing the same property type of lodging, in most cases, where there are more of the number in this type, the lower the price of this type in that place.

(If want to see the Comparison of DMV Area Graph again, you have to rerun the plotting code again.)

2.4 Correlation Coefficients Between Variables

The deeper the color, the highly correlate with each other between two variables. As can be seen from the above table, the number of guests, bedrooms, beds and baths has an impact on the price.

2.5 Analysis of Room Structure in DMV area

Next, let's look at the room structure and the number of guests that can be accommodated in different areas.

Most of the number of bedrooms, beds, and baths in the DMV area is between 1 and 3, and most of them are provided to 2 to 4 guests. However, there were some exceptions: There has lodging, which has 11 baths in Washington, DC, and lodging, which provides 16 guests in Maryland. Lodging in Washington, DC, provides most of its lodging to fewer guests than in the other two areas.

Look at the pattern between different room structures and prices in DMV area.

A clear trend is evident in the 3D scatter plot: As bedrooms, beds, and numbers of guests increase, so do the prices of lodging. Many baths are either expensive (lodgings that can accommodate multiple customers, such as an entire house) or cheap (lodgings that only cater to one customer, such as a Shared room). Moreover, even with the same number of guests and similar room structures, there is still a significant price difference.

2.5 Analysis of Amenities and Facilities of Lodging in DMV area

Next, let's look at some amenities and facilities that are most common in the lodgings in DMV area

It is the most common amenities and facilities in the lodgings in the DMV area, including Air conditioning, wifi, kitchen, and free parking.

Let's take a close look at the proportion of each amenities and facilities in different area.

For the Comparison of DMV Area Graph, we can compare the percentages of each amenities and facility in different areas. Use the Dropdown Menus to see the proportion of each amenities and facilities in an individual area.

It can be seen from the above several pictures that the lodgings in Virginia offers a far fewer washer and wifi services than the other two areas. Not surprisingly, free parking is far less available in Washington DC than in the other two areas.

(If want to see the Comparison of DMV Area Graph again, you have to rerun the plotting code again.)

Popular means the highest rate with the most reviewed. And the best price represents the lowest price on this basis. Compare the top ten most popular with the best price lodgings in different areas. Look at if there have a difference between their property type, room structure, amenities and facilities they offer, and the average price.

The private room and the Entire apartment are both trendy in every areas. Popular lodging generally offers two to four customers, and the numbers of bedrooms, beds, and baths are between one and two. They all offer air conditioning, wifi, and a kitchen. It also will provide free parking, in addition to lodging at DC. Most of their reviews are above 4.8 and have at least 300 reviews, with average prices ranging from 60 to 140 dollar.

2.7 Interactive

Create an Interactive program to retrieve area, guests, rate, and price to return a list of recommended lodgings with the best price.

3. Conclusion

For customers, if you want to know whether the lodging you have selected is a good deal, you can refer to the analysis of the property type section and the room structure analysis section comprehensively to see the median price of the lodging you have selected in a particular type and the median price the cost of the same room structure in the area. You could judge whether you have overpaid or not. Refer to the amenities and facilities analysis section to compare the amenities and facilities you have to common amenities and facilities have in the area. Or simply use the interactive program to filter popular lodging with the best price.

For a host, if you want to be a lodging host in the DMV area and want your lodging to be liked by most people. Based on our latest data analysis, I have the following suggestions:

  1. When naming your lodging, specify the type of lodging and add some adjectives, such as "Cozy," to appeal to customers.
  2. Refer to the property type analysis section and the room structure analysis section to see which type of lodging and the room structure is more popular and how to price it to have a price advantage in the area.
  3. Refer to the amenities and facilities analysis section to see which amenities and facilities should be provided to have more competitive advantages.
  4. To decide the final pricing, combined with the popular lodging analysis report.

4. Learning Processes

This is the first time I've used Web scraping to grab the information I want from a web page, and I had no idea about it before. So I spent a lot of time watching the Web scraping tutorials, and I used Yelp web sites for the practice. In the process of fetching, I also learned to use regular expressions to extract the desired text. It was difficult, but I felt a sense of achievement after done.

I learned to show the multiple graphs with the dropdown menu and subplot. I did a better job in the detail optimization of the chart and the hover's text display than in the last project. I learned to use more visualization to show the corresponding data information, such as word cloud, squarify, 3D scatter plot, and scatterpolar. Finally, I made an Interactive program based on what I learned in class.

So now that we know a litte more about the distribution of property type, the room structure, the proportion of amenities, and facilities in the DMV area, we have also summarized the general attributes of popular lodging with best price. Hopefully, this will help you better know lodging information in the DMV on Airbnb, whether you're a customer or a host.

We can look at lodgings' information by specific places to stay.

If you want to see more, please follow the links to the Zenodo page, or the Web Scraping and Analysis GitHub pages.